Hyperspacings and the Estimation of Information Theoretic Quantities

نویسنده

  • Erik G. Learned-Miller
چکیده

The estimation of probability densities from data is widely used as an intermediate step in the estimation of entropy, Kullback-Leibler (KL) divergence, and mutual information, and for statistical tasks such as hypothesis testing. We propose an alternative to density estimation– partitioning a space into regions whose approximate probability mass is known–that can be used for the same purposes. We call these regions hyperspacings, a generalization of spacings in one dimension. After discussing one-dimensional spacings estimates of entropy and KLdivergence, we show how hyperspacings can be used to estimate these quantities (and mutual information) in higher dimensions. Our approach outperforms certain widely used estimators based on intermediate density estimates. Using similar ideas, we also present a new distributionfree hypothesis test for distributional equivalence that compares favorably with the Kolmogorov-Smirnov test. Using hyperspacings, it is easily extended to multiple dimensions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Functions of Distributions Defined over Spaces of Unknown Size

We consider Bayesian estimation of information-theoretic quantities from data, using a Dirichlet prior. Acknowledging the uncertainty of the event space size m and the Dirichlet prior’s concentration parameter c, we treat both as random variables set by a hyperprior. We show that the associated hyperprior, P (c,m), obeys a simple “Irrelevance of Unseen Variables” (IUV) desideratum iff P (c,m) =...

متن کامل

On Lower Bounds for Statistical Learning Theory

In recent years, tools from information theory have played an increasingly prevalent role in statistical machine learning. In addition to developing efficient, computationally feasible algorithms for analyzing complex datasets, it is of theoretical importance to determine whether such algorithms are “optimal” in the sense that no other algorithm can lead to smaller statistical error. This paper...

متن کامل

Combination of real options and game-theoretic approach in investment analysis

Investments in technology create a large amount of capital investments by major companies. Assessing such investment projects is identified as critical to the efficient assignment of resources. Viewing investment projects as real options, this paper expands a method for assessing technology investment decisions in the linkage existence of uncertainty and competition. It combines the game-theore...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Information Theoretic Analysis of Connection Structure from Spike Trains

We have attempted to use information theoretic quantities for analyzing neuronal connection structure from spike trains. Two point mu tual information and its maximum value, channel capacity, between a pair of neurons were found to be useful for sensitive detection of crosscorrelation and for estimation of synaptic strength, respectively. Three point mutual information among three neurons could...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004